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During COVID-19, wearing a mask was globally mandated in various 
workplaces, departments, and offices. New deep learning convolutional 
neural network (CNN) based classifications were proposed to increase the 
validation accuracy of face mask detection. This work introduces a face 
mask model that is able to recognize whether a person is wearing mask or 
not. The proposed model has two stages to detect and recognize the face 
mask; at the first stage, the Haar cascade detector is used to detect the face, 
while at the second stage, the proposed CNN model is used as a 
classification model that is built from scratch. The experiment was applied 
on masked faces (MAFA) dataset with images of 160x160 pixels size and 
RGB color. The model achieved lower computational complexity and 
number of layers, while being more reliable compared with other algorithms 
applied to recognize face masks. The findings reveal that the model's 
validation accuracy reaches 97.55% to 98.43% at different learning rates and 


different values of features vector in the dense layer, which represents a 
neural network layer that is connected deeply of the CNN proposed model 
training. Finally, the suggested model enhances recognition performance 
parameters such as precision, recall, and area under the curve (AUC). 
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1. INTRODUCTION 

The corona virus disease of 2019 has wreaked havoc around the globe. Wearing face masks in 
public areas is a crucial protective step for people. To prevent the infection from spreading, certain districts 
have made wearing a mask mandatory in public locations. Only a few studies have looked at using image 
analysis to recognize face masks automatically. A light weight, deep learning-based face mask detector was 
presented for embedded systems with low computation needs and good productivity [1]. 

Face detection is a visual technique for detecting human faces in photographs. This strategy is 
critical to the success of the classification of human faces. Furthermore, real-time work on low-cost devices 
is required for practical applications of face recognition. It has been subjected to a variety of traditional 
procedures, but the accuracy is limited. The success of the deep learning method for extracting features from 
objects, on the other hand, has encouraged its employment to identify faces and backgrounds [2]. 

To better recognize small faces, a new scale-invariant face detector, known as the smallest face 
attention (SFA) face detector has been developed. In principle, it is a multi-branch face detection system that 
prioritizes small-scale faces. Large-scale options can be combined with feature maps from surrounding 
branches to make it easier to detect small-scale hard faces. Ultimately, multistage coaching and testing were 
used at the same time to make this model robust at many scales. 
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SFA significantly enhances face identification performance, especially on small faces, according to 
extensive testing. On challenging face detection benchmarks, as well as the WIDER FACE and face 
detection benchmark (FDDB) datasets, this technique achieves good detection performance with competitive 
speed [3], [4]. Face detection is greatly improved by convolutional neural networks with a lot of training. 
However, due to basic problems such as high computational cost and lengthy calculation, current 
convolutional neural network (CNN) based face detectors are unsuitable for many applications. 

A real-time solution to face detection was created using a single end-to-end deep neural network 
with multi-scale feature maps, multi-earlier aspect ratios, and confidence correction. The multi-scale feature 
maps alleviate the difficulty of detecting small faces, while the aforementioned multi-scale aspect ratios 
reduce processing costs and trust rectification, which is in line with biological intuition and might further 
improve the detection rate. The suggested method, FDDB, outperformed the latest generation of CNN based 
algorithms in a public benchmark, but with fewer limitations [5], [6]. Face detection is critical in the 
development of facial recognition, expression, tracking, and classification. 

Traditional techniques have difficulties in a variety of tough scenarios, such as non-frontal faces, 
occlusions, and intricate backdrops. Convolutional neural network (CNN) techniques generate remarkable 
results, despite a large number of calculations. As a result, CNN is incompatible with low-cost CPUs and 
requires high-end hardware. 

This study develops a light architecture for real-time face identification using CNN. The proposed 
architecture consists of two main modules: a backbone for extracting facial traits and a multi layer survey 
for making predictions on several scales. Photos with a video graphics array (VGA) resolution are also 
applicable [7], [8]. 

As part of a powerful rotation-invariant multi-view face detector, the width-first-search (WFS) tree 
detector structure, the Vector Boost algorithm for learning strong classifications with vector output, the weak 
learning method based on domain partitioning, some functions in granular space, and the heuristic search for 
sparse function selection have all been proposed. As a result, the multi-view face detector achieves minimal 
computational complexity, wide detection range, and high detection accuracy on both traditional test sets and 
real-life photographs [9]. Table 1 (see Appendix) shows some related works with its contribution in the field 
of face mask detection and recognitions. The main contribution of this work is introducing a face mask model 
that is able to recognize whether a person is wearing mask or not. The proposed work has two stages to detect 
and recognize the face mask; at the first stage, the Haar cascade detector is used to detect the face, while at 
the second stage, the proposed CNN model is used as a classification model that is built from scratch. The 
model achieved lower computational complexity and number of layers. The proposed model improves the 
performance metrics of recognition in terms of precision, recall, and area under curve (AUC) in comparison 
with other methods that use different algorithms with the masked faces (MAFA) dataset is a masked face 
detection benchmark dataset, of which images are collected from Internet images. MAFA contains 30,811 
images and 35,806 masked faces. 

The organization of this research as follows: Section 1 describes the research background about 
mask face which is a critical strategy for minimizing COVID-19 transmission and saving lives during the 
COVID-19 outbreak. Face recognition systems and some literature survey about face recognition techniques 
that adopted upgraded convolutional neural networks. Section 2 presents the proposed model architecture of 
face mask detection consists of two stages: face detection and classification (with and without mask). Section 
3 demonstrates the experimental results and discussion consisting of three subsections: the first section 
presents the MAFA faces datasets used to prepare the data, which contain the images of masked and 
unmasked faces; the second section is training the MAFA data set, and the third section contains 
classification performance in comparison with other methods that use different algorithms with the MAFA 
FACES dataset. Section 4 discusses the main contributions and conclusions that come with this work. 


2. METHOD 

The proposed model for face mask detection consists of two stages: face detection and classification 
(with and without mask), as shown in Figure |. The face detection step is performed to detect the face region. 
The Viola-Jones Haar cascade detector is used to detect the facial region [10]. This detector performs feature 
extraction by Haar cascade detector with 160x160 window size. The detected face regions are cut and then 
input to the CNN proposed model classification. A less complex and more reliable built-from-scratch model 
are introduced and apply it on the MAFA dataset images to recognize the regions of masked and unmasked 
faces. The proposed model and its architecture are shown in Table 2 and Table 3. It consists of convolutional, 
batch normalization, activation, max pooling, and fully connected layers (which represents a neural network 
layer that is connected deeply of the CNN proposed model training) [11]. 
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e Haar cascade detection 


e Proposed CNN Model 


Figure 1. Procedure of proposed model 


Table 2. The CNN proposed model 


Layer (Type) Output (shape) Parameters 
conv2d (Conv2D) (None, 158, 158, 32) 896 
conv2d_1 (Conv2D) (None, 154, 154, 32) 25632 
conv2d_2 (Conv2D) (None, 150, 150, 64) 51264 
batch normalization Batch No (None, 150, 150, 64) 256 
activation (Activation) (None, 150, 150, 64) 0 
max_pooling2d (MaxPooling2D) (None, 75, 75, 64) 0 
conv2d_3 (Conv2D) (None, 73, 73, 64) 36928 
batch_normalization_1 (None, 73, 73, 64) 256 
activation_1 (Activation) (None, 73, 73, 64) 0 
max_pooling2d_1 (MaxPooling2) (None, 36, 36, 64) 0 
conv2d_4 (Conv2D) (None, 34, 34, 64) 36928 
batch_normalization_2 (Batch) (None, 34, 34, 64) 256 
activation_2 (Activation) (None, 34, 34, 64) 0 
max_pooling2d_2 (MaxPooling2) (None, 17, 17, 64) 0 
flatten (Flatten) (None, 18496) 0 
dense (Dense) (None, 250) 4624250 
activation_3 (Activation) (None, 250) 0 
dropout (Dropout) (None, 250) 0 
dense_1 (Dense) (None, 1) 251 
activation_4 (Activation) (None, 1) 0 


Table 3. Architecture of the proposed CNN model layers 


Layer Kernels size, Stride _ Output image shape 
Input image - (158,158,3) 
Conv | 32 (158,158) 
Conv 2 32 (154,154) 
Conv 3 64 (150,150) 
Batch normalization 64 (150,150) 
RELU - (150,150) 
Max.pooling 1 2°22 (75,75) 
Conv 4 64 (73,73) 
Batch normalization | 64 (73,73) 
RELU - (73,73) 
Max.pooling 2 22,2 (36,36) 
Conv 5 64 (34,34) 
Batch normalization 2 64 (34,34) 
RELU - (34,34) 
Max pooling 3 2*2,2 (17,17) 
Flatten - 18496 vectors 
Dense - 250 vectors 
RELU - 250 units 
Dropout 0.5 250 units 
Dens 1 - 250 units 
Sigmoid - 1 unit 


The process of convolution refers to the convolution of the two main functions (f) and (g), 
formulated as (f * g), whereas the output is (7). The (1) defines the expression for a one-dimensional 
convolution. 


(f *g)(n) = DY  f(m)g(n—m) (1) 
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Convolution can also be used in two-dimensional (2D) digital images, where A is a 2D image with (i * j) 
dimensions, K is a filter with (m * n) dimensions, and F is the feature map. The filter K is convolved with 
image A to produce the output F. In this case, the process shall be defined by (2) because it is commutative, 
and the 2D equation can be written as given by (3). 


FG j) = (A*K)GS/) = YmIn ACM n)K (i — m,j — n) (2) 
FG j) = (A* K)GJ) = YmIn AG — mj — n)K(m,n) (3) 


The RELU activation function is used to neglect the negative values. The adaptive moment estimation 
(Adam) is an optimization algorithm that can be used instead of the classical SGD procedure to update 
network parameters based on training data [10]. 

Adam algorithm has been shown empirically to demonstrate that the model can converge faster than 
the other algorithms. Adam simply combines the functions of the RMSprop and the Stochastic Gradient 
Descent with momentum. In this algorithm, the squared gradients are utilized to weigh the learning rate, as in 
the RMSprop, while benefiting from the momentum through the utilization of the gradient’s moving average 
rather than the gradient itself, as in the SGD. Adam is an adaptive approach for learning rate; in other words, 
it performs the computation of each single learning rate for various metrics. Adam (coined from adaptive 
moment estimation) is termed so due to the fact that it utilizes the 1st and 2nd gradient moments to achieve 
adaptation to the learning rate for each neural network’s weight [12]. N-th moment of a random variable is 
defined as the expected value of that variable to the power of n, as in (4). 


my = E[X"] (4) 


Where m refers to the moment while X refers to the random variable. As the evaluation of the gradient of the 
cost function of neural network is performed on a random small dataset, it is possible to describe it as a 
random variable. The 1‘ and the 2™ moments refer to the mean and the uncentered variance, respectively. To 
achieve moments estimation, Adam employs exponentially moving averages, the computation of which being 
achieved on the gradient that is evaluated on a current mini-batch, as expressed as in (5) and (6). 


Mm, = Pym_-1 +A - Bd) ga (5) 
V_ = Bovr-1 + 1 — Bo) gr” (6) 


Where m and v refer to moving averages, g refers to the gradient on current mini-batch, and betas refer to the 
newly introduced hyper-parameters of the algorithm. The good default values of m and v are 0.9 and 0.999 
respectively. The good default values of m and v are 0.9 and 0.999 respectively. The initialization of the 
vectors that belong to the moving averages occurs with zeros at the first iteration, m and v represent the 
estimates of 1‘t and 2"¢ moments as in (7) and (8). 


Elm] = Elge] (7) 
E[v,] = Elg,7] (8) 


The expected values of the estimators must be equal to the parameter that needs to be estimated. In 
these calculations, it happens that the parameter and the expected value are the same. In the case that these 
properties are correct, this would imply that the estimators are not biased. The lowest first values of the 
gradients make contributions to the collective value, since they become subjected to multiplications by 
smaller and smaller beta values. The equation for moving average can be given as in (9). 


m = (1- f;) ne ie Ji (9) 


At this point, by having a look at the expected value of m, the relation to the true first moment can be 
observed, and then the corrections for the discrepancy will be able to make it as in (10), (11), and (12). 


E[m,] = E[(1— Bi) Dies BI gil (10) 
E[m,] = E[gi]Q — B1) Nico BE * + (11) 
E[m,] = Elgil(1—- B1') +¢ (12) 
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After thought, the bias correction step should be made, since the estimator needs to be corrected in a way that 
the expected value is wanted. As in (13) and (14) present the final expressions of the estimator of the model. 


(13) 


va (14) 


The final step is to utilize these moving averages in weighing each metric’s learning rate individually. By 
using Adam algorithm, this can be achieved very easily, where weight updates can be obtained through the 
application as in (15). 


me 


We = We-1 — 6 (15) 


A 
vE+E 


Where w refers to model weights which is the update rule for Adam, and ¢ represents the Adam learning rate. 

Dropout refers to a commonly used generalization technique. During each training period, 
neurons are released randomly. In this way, the power of function selection is evenly distributed across the 
group of neurons, while forcing the model to learn multiple independent functions. During the training 
process, the fallen neuron will not be part of the backward or the forward propagation to avoid over fitting. 
Instead, the large-scale network is used to make predictions during the testing process. The predicting 
process is configured at the first layer 158x158x32 and the output is a feature map which has the size 
17x17x64. Binary Cross-Entropy has been used as a loss function; it gives a measure in the interval 0-1 of 
how separated two values are. Binary cross-entropy can be defined as the loss function given as in (16) 
[13]. 


loss(a, &) = —(alog(&) +(1-a)logd - «)) (16) 


Where & is the value returned by the model and a is the true label value. It is common to minimize (a, &) for 
multiple images at the same time. 


3. RESULTS AND DISCUSSION 

The result part of this work consists of three sections to achieve the CNN proposed model 
classification. The first section presents the datasets used to prepare the data, which contain the images of 
masked and unmasked faces. The second section describes the training data and the selection of the different 
learning rates and different features vectors. The third section explains the use of some performance 
measures to determine the quality and validation accuracy of the model. 


3.1. Data set 

For the experiments, the dataset MAFA was used as in [14], which consists of 35,806 masked faces 
with a minimum size of 160x160. From this MAFA dataset, 5902 images were selected which contain frontal 
faces. The dataset was divided into two parts for training and validation sets, with 4759 and 1143 images, 
respectively. Figures 2-3 can show example images in the MAFA dataset. 


Figure 3. Faces without mask in MAFA dataset 
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3.2. Training 

The training procedure included two stages. The first stage used 0.001 Adam optimizer learning rate 
value in the initial stage and calculated the entire performance metrics. The second stage changed the 
learning rate to 0.0001, but also following the same procedure. The binary cross-entropy loss between the 
training and validation was measured at different learning rates that indicate the performance of the proposed 
model, using batch size value of 32. The training process took place in 30 Epoch; the selected features vector 
values were 250. Figure 4 shows the binary cross-entropy loss between training and validation datasets at a 
0.001 learning rate. The results showed a loss of 0.0727, which is lower than that achieved at a learning rate 
of 0.0001 (0.1094). Figure 5 illustrates the accuracy between the training and validation data. Figures 6-7 
illustrate the validation accuracy and loss between the training and validation datasets, with Epoch at features 
vector of 1500 values selected as input to the dense layer. 
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0.92 4 
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Figure 6. The accuracy of the training and validation a Figure 7. The training and validation loss at 
features vector of 1500 values features vector of 1500 values 


3.3. Performance 

The performance of the proposed model was evaluated using five metrics, which are the precision or 
Specificity (is a measure of the fraction of negative class that is correctly classified), recall (true positiverate is a 
measure of the fraction of positive class that is correctly classified), validation accuracy (is the number of 
correct predictions divided by the total number of instances evaluated) , training error (the number of incorrect 
predictions divided by the total number of instances evaluated), and false positive rate (FPR), as given as in 
(17), (18), (19), (20), (21), and (22). The precision and area under the curve (AUC) metrics indicate the model's 
accuracy and classification quality as determined by the classification results. The retrieval metric that presents 
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the ability to find all of the relevant objects in a dataset is given in Table 4 at different learning rates, the loss 
between training and validation datasets at a 0.001 learning rate, the results showed a loss of 0.0727, the 
validation accuracy reached to 97.55% which was lower than that achieved at a learning rate of 0.0001 (0.1094), 
although the validation accuracy reached to 98.34%, while 250 features vector values were selected for both. 


Precision = —-— (17) 
TP +FP 
a _ ae 
TPR (True positive rate) = Recall = ———— (18) 
Validation Acc = —G*™)__ (19) 
(TP +FP +TN+FN) 
at _ (FP +FN) 
Training error = Tap 4tN 4PM) (20) 
age ees SEN 
Specificity = — | (21) 
o ae Sgt aiy tos CRP 
FPR (False positive rate) = 1 — Specificity = —-— (22) 


Where true positive (TP): the class is positive and the model prediction also positive. True negative (TN): the 
class is negative and the model prediction also negative. False positive (FP): the model was misclassified in 
the negative class. False negative (FN): the model was misclassified in the positive class. Where real positive 
class (P)=TP+FN, real negative class (N)=FP+TN. Different features vector values were selected in the dense 
layer used in the training dataset; the validation accuracy reached to 98.43% at features vector value of 1500, 
whereas the loss value was 0.0623 and the mean square error value was 0.0132 and calculates all the metrics 
performance (confusion matrix) as shown in Table 5. 


Table 4. The confusion matrix at 30 epochs for different Adam optimizer learning rates 


: ee Mean 
Learning True True False False Precision Recall FPR AUC Validation Loss sauce 
rate positive negative positive negative (TPR) _accuracy enor 
0.001 583 532 15 13 0.9749 0.9782 0.0274 0.9962 0.9755 0.0727 0.0197 
(best 
test) 
0.0001 585 539 8 11 0.9865 0.9815 0.0146 0.9928 0.9834 0.1094 0.0155 
Table 5. The validation accuracy and loss with different features vector values 
ee Mean 
Features True True False False Precision Recall FPR AUC Validation Leas scante 
vector positive negative positive negative (TPR) _accuracy enop 
100 582 533 14 14 0.9765 0.9765 0.02559 0.9969 97.55% 0.0769 0.0200 
250 583 532 15 13 0.9749 0.9782 0.02742 0.9962 97.55% 0.0727 0.0197 
500 588 525 22 8 0.9639 0.9866 0.04021 0.9951 97.38% 0.0804 0.0214 
900 589 529 18 7 0.9703 0.9883 0.03290 0.9957 97.81% 0.0687 0.0169 
1200 585 530 17 11 0.9718 0.9815 0.03107 0.9952 97.55% 0.0807 0.0209 
1500 588 537 10 8 0.9833 0.9866 0.01828 0.9966 98.43% 0.0623 0.0132 
(best 
test) 
1600 584 530 17 12 0.9717 0.9799 0.03107 0.9963 97.46% 0.0708 0.0188 
1700 587 529 18 9 0.9702 0.9849 0.03290 0.9959 97.46% 0.0695 0.0181 
2000 580 535 12 16 0.9797 0.9732 0.02193 0.9948 97.55% 0.0801 0.0193 
3000 582 518 29 14 0.9525 0.9765 0.05301 _0.9947 96.24% 0.0967 _ 0.0276 


Figures 8-9 show charts representing different features vector values selected in the CNN flatten 
layer in the training dataset and their effects on validation accuracy and loss. This change in the features 
vector describes the changes in accuracy and loss, which will therefore affect the quality and accuracy of the 
classification model. 

The accuracy of the proposed model can reach up to 97.55% when used to detect objects in 160x160 
pixels images with features vector of 250 values, which is higher than that achieved by other methods. Also, 
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the values of the performance metrics are higher, and the model showed the ability to perform faster. Also, 
the accuracy reached up to 98.43% when 1500 values of features vector were selected in the dense layer of 
our proposed model. 

Table 6 illustrates the values of the performance metrics of precision and recall of the proposed 
model in comparison with other methods that use different algorithms with the MAFA FACES dataset. The 
accuracy of the proposed model can reach up to 97.55% when used to detect objects in 160x160 images with 
features vector of 250 values, which is higher than that achieved by other methods. Also, the values of the 
performance metrics are higher, and the model showed the ability to perform faster. Also, the accuracy 
reached up to 98.43% when 1500 values of features vector were selected in the dense layer of our proposed 
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Figure 8. The validation accuracy versus features vector 
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Figure 9. The loss versus features vector 


Table 6. Comparison of the metrics performance between the proposed model and other algorithms 


Reference no. Year Precision Recall _ Accuracy 
[14] 2017 0.828 0.89 89% 
[15] 2020 0.83 0.901 90.1% 


Our proposed model of 250 values features vector is chosen 2022 0.9749 0.9782 97.55% 
Our proposed model of 1500 values features vector is chosen __ 2022 0.9833 0.9866 98.43% 


4. CONCLUSION 
This work presented a Haar cascade detector within a proposed built-from-scratch CNN image 
classification model. The model showed lower computational complexity and higher reliability with the need 
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for fewer layers to treat RGB images with size of 160x160 pixels, when compared with other algorithms. The 
proposed model improved the performance metrics of validation accuracy, precision, recall and AUC when 
applied on the images of the MAFA data set, with the application of different values of Adam optimizer 
learning rate and features vector. Another contribution of this work was the use of different values of features 
vector thereby influencing the performance metrics of classification, as demonstrated by the reduction of the 
loss between the validation and training datasets. This system can be applied practically in any crowded 
building when it is necessary to recognize masked from unmasked faces. 


APPENDIX 


Table 1. Contributions from related works that used certain techniques to detect and recognize face masks 


Ref. No. Year Method and Techniques Contribution Dataset used 
[14] 2017 To extract the candidate face regions from the input image and show them Precision 82.8% | MAFA FACE 
with large descriptors, the proposal module first concatenated two pertained Recall 89% 


CNNs. Using the local linear integration (LLE) algorithm and dictionaries, 

the integration module was used to convert these descriptors into a match- 

based descriptor. Faceless faces, masked faces, and synthesized regular 

faces were all used in the training. 

[15] 2020 In this system's real-time mask detection and recognition technique, the Acc 90.1% MAFA FACE 

Haar cascade classifier was used to detect the face, and the YOLOv3 

algorithm was utilized to detect the mask and determine whether the 

individual was wearing it or not. 


[16] 2020 On the basis of this analysis, Advanced Multitasking Cascaded Acc 82.6% WIDER 
Convolutional Neural Networks (MTCNN) were used to detect the mask- FACE 
induced area of occlusion, find the LBP (Local Binary Pattern) function of + 
the non-mask-induced area of occlusion hidden, and finally enter the MAFA FACE 
function into the vector machine face recognition support. 

[17] 2020 A very fast image pre-processing with the mask in the center over the faces Acc 96.37% GitHUB 


are proposed. Classification and detection of a masked person, features 
extraction and CNN are used. 


[1] 2021 A light weight single-face mask detector based on deep learning was Acc 93.8% WIDER 
proposed to meet the minimal computing requirements of embedded FACE 
devices while providing great productivity. Two new strategies to improve + 
the model feature extraction process were proposed. First, a new residual MAFA FACE 


contextual attention module was presented to extract information from the 
rich surroundings and focus on the critical regions linked with the face 
mask. Second, a unique assistive task based on synthesized Gaussian heat 
map regression was proposed to learn additional discriminatory features for 
masked and unmasked faces. 


[18] 2021 A new data collection and two alternative approaches were provided to Acc 97% WIDER 
distinguish masked and unmasked faces in real-time. The first technique Precision 92% FACE 
employed an object detection model. The second technique used a YOLO Recall 97% + 
face detector to detect faces (whether masked or not), followed by a unique, MAFA FACE 


fast, yet effective CNN architecture to categorize the faces into masked and 
unmasked categories. 


[19] 2021 In an automated face mask identification system, median filtering and the Acc 91% Small dataset 
back propagation neural network (BPNN) were proposed. Sensitivity 90% from Internet 
Specialist 92% 
[20] 2021 The first high-performance, single-stage face mask detector, Retina Acc 94.8% MAFA FACE 
Facemask was proposed. To begin, annotations were used to construct a + 
new dataset to address the problem that previous studies were unable to FMD FACE 


distinguish between right and erroneous mask-wearing situations. Second, a 
contextual attention module was suggested that would focus on learning the 
distinguishable traits associated with different face mask usage states. 


[21] 2021 A new mask identification and classification technique that includes both Acc 96.03% MAFA FACE 
transfer learning and deep learning was proposed. A replacement technique + 
for mask recognition that mixes transfer learning and economical-Yolov3 WIDER 
was suggested, with an efficient net because of the use of the backbone FACE 


feature extraction network and CIoU as the loss function, to cut back the 
number of network parameters and improve mask detection accuracy. 
[22] 2021 A method was presented for obtaining partially rebuilt aspects of the Acc 85% -95% YALE FACE 
obscured part of the face. The face was recognized using an existing deep 
learning approach. The occluded part of the face was first removed, and the 
remaining component of the face was subjected to Principal Component 
Analysis (PCA). 
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Table 1. Contributions from related works that used certain techniques to detect and recognize face masks 


(continue) 
Ref. No. Year Method and Techniques Contribution Dataset used 
[23] 2021 Despite the fact that the initial dataset was fairly limited, the CNN model Acc 97.1% Kaggle 


that applied the transfer learning technique showed exceptional accuracy. 
For the first time, a big face mask detection data set was utilized to train the 
model, while the much smaller initial face mask detection data set was used 
to modify and refine the previously developed model. 


[24] 2021 Two less difficult convolutions were presented. A Raspberry Pi 4 module Acc 97.67% Special data 


was utilized to create Neural Network (CNN) based classifications that set 
successfully identify data and recognize persons in real-time. only 1000 
face masks 


[25] 2021 To achieve short inference time and good accuracy, the suggested Acc 98.2% MAFA 


technique used a set of one- and two-stage detectors. ResNet50 was used as __ Precision 98.92 + 
a starting point and the learning transfer idea was used to incorporate high- Recall 98.24 Special 
level semantic information into various function mappings. dataset. 
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